This project investigates the relationship between county-level age demographics and economic outcomes, specifically focusing on debt-to-income ratios. As counties across the U.S. experience demographic shifts, particularly the aging of the population, local governments face fiscal pressures tied to rising expenditures on retirement benefits and healthcare, coupled with reduced tax revenue from an aging workforce. We hypothesize that a higher proportion of older residents correlates with and potentially drives higher county-level debt-to-income ratios.
This analysis is motivated by the need to understand how demographic trends influence fiscal health at the local level. Counties with larger elderly populations may prioritize policies that cater to retirees—such as increased spending on pensions—potentially exacerbating debt-to-income ratios. Conversely, counties with younger populations may exhibit different fiscal dynamics, such as higher workforce participation and tax revenues.
The primary research question question of this research: Does the age distribution of a county’s population have a causal impact on debt-to-income ratios, potentially driven by shifts in local government spending and tax revenues?
In this research, we assume that tax revenues can serve as a proxy for local government policies. This assumption is based on the idea that tax revenue levels reflect fiscal priorities, including how resources are allocated and expenditures are balanced to address demographic shifts and economic demands.
We begin by preparing the dataset for econometric analysis. The dataset includes various demographic and economic variables at the county level across multiple years. The dataset is cleaned and transformed to make it suitable for panel data analysis.
This is a descriptive regression analysis where we use ElasticNet regression to quantify the associations and perform predictive analysis. The motivation behind this study is to understand how different factors influence the debt-to-income ratio at the county level and use the model to predict future values of this ratio. In addition to this, we use the variables selected in the elasticnet model to create a linear model on the panel data for potential causal inference.
To better understand this relationship from the standpoint of causal inference, we also applied a fixed effects panel regression model. This model controls for individual heterogeneity at the county level, allowing for more precise estimation of relationships between the variables and the target debt-to-income ratio. The motivation here is to explore how these factors change over time within each county, thereby accounting for unobserved time-invariant characteristics.
The dataset is first transformed into a panel format by creating a
unique identifier for each county based on its state and county name. We
then ensure that the data is structured correctly for panel analysis,
using UniqueID and Year as the panel indices.
After that, we standardize all numeric variables within each county
(identified by UniqueID) to account for differing scales of
the variables.
We selected the Annual_Debt_to_income_ratio_low variable
as the target for prediction. To predict this variable, we use
ElasticNet regression, which is a regularization technique that combines
both Lasso (L1) and Ridge (L2) penalties. ElasticNet is particularly
useful when there are many correlated predictors.
The data is split into a training set (80%) and a test set (20%). We
then fit the ElasticNet model using cross-validation to find the optimal
regularization parameter, lambda. The cross-validation
results indicate the best value of lambda to minimize prediction
error.
Annual_Debt_to_income_ratio_low was chosen as the
outcome variable because it serves as a key indicator of household
financial health at the county level. This metric reflects the balance
between debt obligations and income, providing insight into the economic
well-being of a county’s residents on average. Predicting this ratio has
particular import to several groups. For instance, policymakers can use
it as a tool for identifying financial stress in order to specify policy
prescriptions. Similarly local governments can make use of these
predictions to more efficiently allocate resources. Business executives
may find the debt-to-income ratio important when it comes to identifying
areas that have potential. This can affect decisions about investments,
credit offerings, and business expansion opportunities.
The ElasticNet model is fitted on the training data, and we obtain the optimal lambda value from cross-validation. The lambda that minimizes the mean squared error is:
Optimal lambda (from cross-validation): 0.0002784226
The model’s performance is evaluated on the test set by calculating the Mean Squared Error (MSE):
MSE on Test Set: 0.813388
This indicates that the model has a relatively low error, suggesting a good fit for the data.
The model’s coefficients represent the relationship between each
predictor and the target variable,
Annual_Debt_to_income_ratio_low. Some key coefficients
are:
Compensation and
Taxes on production and imports have positive coefficients,
indicating that as these variables increase, the
Annual_Debt_to_income_ratio_low tends to increase as
well.Real GDP show negative coefficients, meaning that as GDP
increases, the debt-to-income ratio tends to decrease.The coefficients help quantify the magnitude and direction of relationships between the predictors and the response variable.
The performance of the ElasticNet model is assessed through cross-validation and MSE. The optimal lambda value helps balance model complexity and predictive accuracy. The low MSE indicates that the model is able to predict the debt-to-income ratio with reasonable accuracy.
The primary focus of this analysis was predictive modeling, where we
used ElasticNet to predict the
Annual_Debt_to_income_ratio_low based on a variety of
socioeconomic and fiscal variables. The model’s performance, reflected
in the MSE, suggests that the selected variables capture important
information for predicting the target variable.
While the predictive model provides useful insights, there are limitations to its ability to capture complex, non-linear relationships between variables. Future steps could include exploring more advanced machine learning techniques or using the model for more granular forecasting. Additionally, understanding the potential sources of bias, especially in terms of omitted variables or unmeasured confounders, is important for making the results more reliable.
The process involved several steps: 1. Variable Selection: Based on prior analysis, we selected a subset of variables that were found to have significant coefficients in the ElasticNet model. These variables are likely to have the most influence on the debt-to-income ratio, and using elasticnet to select them helps reduce the likelihood of our analysis being influenced by omitted variables bias.
Data Preparation: A new dataset was created,
which includes only the UniqueID, Year,
Annual_Debt_to_income_ratio_low, and the selected
variables. This dataset was then arranged by county
(UniqueID) and year (Year).
Fixed Effects Model: A fixed effects panel
regression was run using the plm package in R, with
Annual_Debt_to_income_ratio_low as the dependent variable
and all selected variables as independent variables. The model controls
for individual (county) effects by including a “within”
specification.
While the model was able to be computed, it was riddled with multicolinearity among the regressors which made its model summary uncomputable. Removing the multicolinearly would require us to remove most of the regressors in our model, which would defeat the purpose of using elasticnet as a method of selecting regressors such to minimize risk of ommitted variables bias. We were thus unable to utilize the predictive model in service of causal inference. However, this did provide us with additional evidence that, given our chosen candidate variables of interest, that the question at hand was better suited to predictive analysis using elasticnet rather than causal inference. We therefore include the above discussion of the regression.
The fixed effects model accounts for unobserved county-level heterogeneity but does not address potential endogeneity or omitted variable bias that could still affect the estimates. Further analysis could include instrumental variable approaches or explore other econometric techniques to deal with these concerns. Additionally, the model assumes linear relationships between predictors and the outcome, which may not fully capture the complexity of the underlying dynamics.
This analysis provides a predictive framework for understanding the
relationship between various socioeconomic and fiscal factors and the
Annual_Debt_to_income_ratio_low. ElasticNet regression has
been employed to quantify these relationships and predict the
debt-to-income ratio. The results show meaningful associations between
predictors and the target variable, and the model has demonstrated good
predictive performance.
Future work could involve exploring causal inference techniques or enhancing the model with more complex machine learning algorithms to further improve prediction accuracy.
Both authors of this report made use of ChatGPT during the research and writing process. No other AI services were used. There were three main use cases for this AI. The first and primary use was for coding assistance in R. This includes translating descriptions of desired data transformations into code chunks, analyzing written code for errors, and suggesting functions for specific use cases. ChatGPT was particularly helpful in generating and modifying visual elements and figures for the report. For example, if the title of a graphic was hard to read ChatGPT could provide a list of potential improvements that could be tried or discarded. The second use of AI was in its capacity to generate writing prompts. Instructions and data were fed into the AI in order to generate ideas of what to write about as well as structure to the writing. Notably, there was intentional effort to not directly copy and paste any ChatGPT output. This applies to code as well; the majority of incorperated information generated by ChatGPT was used to inspire novel approaches or troubleshoot problems, rather than as direct inputs in the final product. A notable exception is a minority is some code snippets that were directly pulled. The final use of AI in the project was in the form of proofreading. Select sections of text were parsed by the language model and asked for spelling and grammar checks.